JBoss Community Archive (Read Only)

RHQ 4.9

Design-HA Use Cases

single-server setup

note: generally to ensure HA changes for multiple servers have not introduced installation / upgrade regressions for single-server setups

  • start with existing (upgrade) or new (overwrite) db

  • use installer to add server: name it server-1, leave default affinity group option selected

  • test behavior of old agent trying to connect to new server

    • should fail and report protocol version mismatch error

cache load after agent registration

  • register a new agent agent-1 to this server

  • tail the server log and make sure the cache gets loaded for this agent

  • using HAAC agent view, verify agent-1 failover list is server-1 only.

cache load after agent reconnect

  • stop the agent

  • start the agent (without "--clean")

  • tail the server log and make sure the cache gets reloaded for this agent

cache load after server failure

  • kill the server

  • wait several minutes for agent data to spool up

  • restart the server

  • tail the server log and make sure the cache gets reloaded for this agent

    • note: the cache must be loaded BEFORE any agent reports are sent

cache load after server maint mode

  • put the server into MM

  • tail the agent logs to ensure it is initiating failover

  • put the server into NORMAL mode

  • tail the server log and make sure the cache gets reloaded for this agent

    • note: the cache must be loaded BEFORE any agent reports are sent

verify HA console affinity group view

  • register 5 more agents to the server (for a total of 6)

  • go to HA admin console, "affinity groups" section

  • ensure that servers and agents listed all have an empty affinity group value

  • using HAAC agent view, all 6 agents have failover list with server-1 only.

  • using HAAC server list view, server-1 should show agent count of 6 assuming all agents are running)

verify server HA tracking data

  • log into the application as the superuser

  • go to HA admin console, "servers" section

  • verify that there is the correct # of servers

    • exposed at the proper IPs/ports?

    • in the proper affinity groups?

    • properly shows which agents are connected to which servers?

    • do you see which servers are up/down correctly?

  • also, repeat "verify HA console affinity group view"

verify agent HA tracking data

  • go to HA admin console, "agents" section

    • ensure affinity group information matches expected values for all agents listed

  • verify agent resource configurations

    • go to configuration>current subtab for each agent, verify the affinity group configuration property is correct for that agent

    • verify the most recent configuration update history reflects the correct value for each agent

  • also, repeat "verify HA console affinity group view"

HA console-based affinity group update

  • go to HA admin console, "affinity groups" section

  • click edit

  • use UI controls to move servers and/or agents into different affinity groups

  • click save

  • verify results (there will be some redundancy since verify server/agent HA data tasks will both require verifying the data on the HA admin console, "affinity groups" section)

    • repeat "verify server HA tracking data"

    • repeat "verify agent HA tracking data"

ensure agent distribution

  • log into HA admin console

  • click re-partition button

  • use other HA console UI pages to ensure that:

    • the agents are distributed correctly across all servers

      • If affinity is in use the distribution may not be even. Satisfying affinity is weighted more highly than even distribution. See HA Load Balancing for more detail.

    • the failover list for each agent include all servers (no servers should be listed in duplicate)

add server to the cloud

note: this performs an install using the same installer, but with different options. since you'll be configuring this server against the same existing database, this will be an HA install. it tests a different installer path, and also introduces server-side HA principles that need to be tested.

  • use installer, and point to the same database / port / dbuser you did earlier

  • name this instance server-2 (should run on a different machine or, minimally, the same machine and different port)

  • verify that the web UI can be reached from any server endpoint

  • repeat "ensure even agent load distribution"

    • Installing a server to the cloud will repartition the agents. Note that affinity will still be satisfied so previous agent affinity should still hold, creating an uneven distribution.

    • Note that affinity assignment changes also will repartition the agents.

simulate server crash

  • take down one of the servers - shutdown operation (graceful) or kill it

  • repeat "verify server HA tracking data"

  • repeat "ensure even agent load distribution"

    • A server going down, or into maintenance mode does not repartition. Server lists remain the same and agents will fail over using their existing lists.

server maintenance mode

  • log into HA admin console, "servers" section

  • take an enterprise-wide snapshot of which agents are connected to which servers

  • click "maintenance mode" button next to any server

  • repeat "ensure AG-aware agent load distribution", but do not click the re-partition button

    • note, a full redistribution is not performed at this time - only the agents connected to the server that went into maintenance should fail over to their secondaries; validate this by taking another enterprise-wide snapshot of which agents are connected to which servers and compare against previous shapshot

  • click the "normal" button for the same server again (ending the maintenance period)

  • repeat "ensure AG-aware agent distribution", this time click the re-partition button as normal

    • Cloud member operation mode changes (going up or down, or in and out of maintenance mode) don't really affect the agent distribution algorithm. Server lists will include cloud members that may be temporarily unavailable. So, re-partitioning at this point should not have any impact on distribution.

    • Cloud size changes do affect the agent distribution. So, install or deletion of servers will have a major affect on distribution.

verify server maint mode

  • use the HA admin console > servers section to put one server into maint mode

  • wait a few moments (agents will failover to the remain server in NORMAL mode)

  • use HA admin console > servers section to ensure all agents have connected to a single server

  • put the server back into NORMAL mode

  • after some time (should be no more than 1 hour) agents will switch back to their primary server.

    • if you want to speed up this test you can reduce the 1 hour setting in the agent configuration (rhq.agent.primary-server-switchover-check-interval-msecs) to, say 10 minutes.

  • ensure all agents are connected to their primary server

network blip / hiccup testing

  • temporarily block an agent from connecting to a server

    • firewall setting / port forwarding config / unplug one of them from the wall

  • if time is short (~15secs) verify that the agent does not fail over

    • use HA admin console to verify that agent failover history is unchanged

  • if time is long (~1min) verify that the agent fails over

    • use HA admin console to verify that agent failover history has new entries

ensure AG-aware agent load distribution

  • repeat "add server to the cloud", for a total of 3 servers in the cloud

  • log into HA admin console

  • click re-partition button

  • use other HA console UI pages to ensure that:

    • the agents are evenly load-distributed across all servers (should be 2 agents per server, if no affinity is assigned)

    • the failover list for each agent include all servers (no servers should be listed in duplicate)

affinity groups (single-server membership)

note: affinity groups (AG) provide a mechanism for agents to prefer to connect to some servers over others.

  • log into HA admin console

  • assign 1 server to AG-1 and 2 agents to AG-1 (the rest of the agents/servers won't be in any AG)

    • make sure the agents you assign to AG-1 are currently NOT connected to the server put in AG-1

  • click re-partition button, and wait a while

    • You could wait quite a while for this (like a day, as agents do not "pull" a new server list very often. As an alternative you can:

      • Restart the agents

      • Use the new agent operation, via the GUI, to force (all of) the agents to update their lists (this is preferred as the agent keeps running).

  • ensure that the AG-1 agents are now connected to the AG-1 server

  • ensure the other 4 agents are evenly distributed across the remaining two non-AG servers

affinity groups (multi-server membership with failover)

  • repeat "add server to the cloud", for a total of 4 servers in the cloud

  • log into HA admin console

  • assign 2 more servers to AG-1 (now there are 3 servers in AG-1, and 1 server in no AG)

  • assign 2 more agents to AG-1 (now there are 4 agents in AG-1, and 2 agents in no AG)

  • click re-partition button, and wait a while

    • You could wait quite a while for this (like a day, as agents do not "pull" a new server list very often. As an alternative you can:

      • Restart the agents

      • Use the new agent operation, via the GUI, to force (all of) the agents to update their lists (this is preferred as the agent keeps running).

  • ensure that the 4 AG-1 agents are now connected to one of the 3 AG-1 servers

  • ensure the other 2 agents are connected to the non-AG servers

  • put one of the AG-1 servers into MM

  • ensure that the 4 AG-1 agents are now connected to one of the 2 remaining AG-1 servers

  • ensure the other 2 agents are still the only ones connected to the non-AG servers

JBoss.org Content Archive (Read Only), exported from JBoss Community Documentation Editor at 2020-03-13 08:00:21 UTC, last content change 2013-09-18 19:41:31 UTC.